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Hybrid is a formal theory implemented in Isabelle/HOL that provides an interface for representing 
and reasoning about object languages using higher-order abstract syntax (HOAS). This interface is 
built around an HOAS variable-binding operator that is constructed definitionally from a de Bruijn 
index representation. In this paper we make a variety of improvements to Hybrid, culminating in 
an abstract interface that on one hand makes Hybrid a more mathematically satisfactory theory, and 
on the other hand has important practical benefits. We start with a modification of Hybrid's type of 
terms that better hides its implementation in terms of de Bruijn indices, by excluding at the type level 
terms with dangling indices. We present an improved set of definitions, and a series of new lemmas 
that provide a complete characterization of Hybrid's primitives in terms of properties stated at the 
HOAS level. Benefits of this new package include a new proof of adequacy and improvements to 
reasoning about object logics. Such proofs are carried out at the higher level with no involvement of 
the lower level de Bruijn syntax. 

1 Introduction 

Hybrid is a system developed to specify and reason about logics, programming languages, and other 
formal systems expressed in higher-order abstract syntax (HOAS). It is implemented as a formal theory 
in Isabelle/HOL [15]. By providing HOAS in a modem proof assistant, Hybrid automatically gains 
the latter's capabilities for meta-theoretical reasoning. This approach is intended to provide advantages 
in flexibility and proof automation, in contrast to systems that directly implement logical frameworks, 
which must build their own meta-reasoning layers from the ground up. Building a system such as Hybrid 
within a general purpose theorem prover poses a variety of challenges. Our goal in this work is to improve 
the implementation and interface of Hybrid's basic theory, bringing it to a point where its potential 
advantages can be more fully realized. 

Using HOAS, binding constructs in the represented language (the object logic or OL) are encoded 
using the binding constructs provided by an underlying A -calculus or function space of the meta-logic, 
thus representing the arguments of these constructs as functions of the meta-level. Isabelle/HOL im- 
plements an extension of higher-order logic, where the function types are "too large" for HOAS in two 
senses. First, they contain elements with irreducible occurrences of logical constants, which do not rep- 
resent syntax. Second, the function space T T has larger cardinality than T, so a variable-binding 
operator represented as a functional <I> of type (t =^ t) =^ T cannot be injective. This makes it unsuitable 
for syntax, for we cannot uniquely recover the argument F from a term of the form ^{F). Our work 
builds directly on the original Hybrid system [1], whose solution to both problems is to use only a subset 
of the funtion type, identified by a predicate called abstr. It builds a type expr of terms with an HOAS 
variable-binding operator definitionally in terms of a de Bruijn index representation. 
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In earlier work joint with Alberto Momigliano, we gave a system presentation of Hybrid [14], which 
built on the original Hybrid and serves as a starting point for the work presented here. In this paper, we 
fill in many details that could not be described in a short system description, as well as make signifi- 
cant further improvements, allowing us to complete a characterization of Hybrid's type expr in terms of 
properties stated at the HOAS level. In the new Hybrid, the type expr, its constructors, and these prop- 
erties form an abstract interface that allows users to reason at the higher level with no involvement of 
the lower level implementation details. This interface was motivated by and is illustrated by a new proof 
of representational adequacy for Hybrid [12, Sect. 3.4] that does not make any reference to de Bruijn 
syntax. 

We start in Sect. 2 by giving an abstract view of Hybrid that motivates and explains the interface. 
Sections 3-7 fill in many of the details of its implementation. The type dB implementing the de Bruijn 
index representation is defined in Sect. 3, along with a predicate level to keep track of dangling indices. 
The original Hybrid [1] used a datatype corresponding to our dB directly as expr. Section 4 defines the 
new version of expr, which excludes at the type level terms with dangling indices. This simplifies the 
representation of object languages by eliminating the need to carry a predicate for this purpose (called 
proper in [1]) along with Hybrid terms in meta-theoretic reasoning. Section 5 defines Hybrid's variable 
binding operator LAM and the abstr predicate. These definitions support a stronger injectivity prop- 
erty, presented in Sect. 6 with only one abstr premise rather than two. This property was also proved 
in [14]; the results here generalize and simplify these definitions as well as simplify other related Hybrid 
internals. (In particular, we eliminate the need for the auxiliary function dB_fti defined in [14] using the 
function package first introduced in Isabelle/HOL 2007, and we eliminate some other auxiUary functions 
by using a more systematic treatment of level.) 

In Sect. 7, we formally prove that a version of abstr for two-argument functions (as described in [13]) 
is equivalent to a conjunction of one-argument abstr conditions on "slices" of the function (fixing one 
argument). We use this result to prove a case-distinction lemma for functions satisfying abstr, and a 
lemma that enables compositional proof of abstr conditions at the HOAS level, without conversion to de 
Bruijn indices as required in [1]. These two lemmas represent important new results that complete the 
abstract interface for Hybrid. 

In Sect. 8, we discuss related work as well as ongoing work with Hybrid. 

The Isabelle/HOL 201 1 theory file for the present version of Hybrid is available onUne at: 

http : //hybrid . dsi . unimi . it/download/Hybrid . thy 

and a more thorough presentation can be found in the first author's Ph.D. thesis [11, 12]. In addition 
to the results described here, this theory file also replaces tactic-style proofs of the original version of 
Hybrid with Isar proofs. This style of proof is both more readable and more robust against changes to the 
underlying proof assistant. It also includes rewrite rules for Isabelle's simplifier to convert automatically 
between HOAS at type expr and de Bruijn indices at type dB. With the improvements allowing users to 
work exclusively at the HOAS level, this is no longer needed, and only included for illustrative purposes. 

2 An Abstract View of Hybrid 

We use a pretty-printed version of Isabelle/HOL concrete syntax in this and the following sections. A 
double colon :: separates a term from its type, and the arrow =^ is used in function types. We stick to the 
usual logical symbols for connectives and quantifiers (->, A, V, — >, V, 3). Free variables (upper-case) 
are implicitly universally quantified (from the outside). The sign = (Isabelle meta-equality) is used for 
equality by definition, and => for Isabelle meta-Ievel implication. In the notation [ Pi ; . . . ; P„ ] 
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P, the square brackets are used to group premises to abbreviate nested implications; in its expanded form, 
it is Pi =^ . . . =^ Pn =^ p. Similarly, [?],..., f„ ] =^ f abbreviates the type ?i =^ • • • =^ =^ 
/. The keyword datatype introduces a new datatype, while function introduces a recursively defined 
function. We freely use infix notations, often without explicit declarations. Other syntax is intrduced as 
it appears. 

Isabelle/HOL already has extensive support for first-order abstract syntax, in the form of its datatype 
package. Hybrid may be viewed as an attempt to approximate a datatype definition that is not well- 
formed because of its higher-order features: 

datatype expr = CON con | VAR var | APP expr expr (notation (s $$ t)) 
I LAM {expr ^ expr) (notation (LAM x. B)) 

where CON represents constants, from an OL-specific type con (typically a trivial datatype); VAR may 
be used to represent free variables, from a countably infinite type var (actually a synonym for nat); 
APP represents pairing, which is sufficient to encode Ust- or tree-structured syntax; and LAM represents 
variable binding in HOAS style, using the bound variable of an Isabelle/HOL A -abstraction to represent 
a bound variable of the object language.^ 

It should be noted that Hybrid only approximates one such pseudo-datatype, not the datatype pack- 
age with its abiUty to define multiple types for first-order abstract syntax. That is. Hybrid is untyped, so 
predicates rather than types must be used to distinguish different kinds of OL terms encoded into expr. 

The problem with the above definition is LAM, whose argument type includes a negative occurrence 
of expr (underlined above). This is essential for HOAS, but it is not permitted in a datatype definition 
[16, Sect. 2.6], and it will require modifications to some of the properties expected for a constructor of a 
datatype; we will return to this issue later. 

Hybrid does provide a type expr with operators CON, VAR, APP, and LAM of the appropriate types. 
This type and the latter three operators can be used directly as a representation of the untyped A -calculus. 
When encoding OLs in general, however, it is usual to represent each OL construct as a list built using 
$$ and headed by a CON term identifying the particular construct. To illustrate this idea, we take the 
untyped A -calculus as our OL with its usual named- variable syntax, using capital letters for variables 
(y„ / G N) and A-abstraction (A) to avoid confusion with Isabelle's A operator. In this form, an object 
language term (A Vi . A V2- {V\ V2) V3), for example, can be represented as 

clam $$ (LAM x. c_lam $$ (LAM y. c_app $$ (c_app $$ x $$ y) $$ VAR 3)), 

where c_lam = CON ci and c_app = CON ci for distinct constants ci,C2 :: con. We may use Isabelle's 
ability to define abbreviations and infix notations to recover a reasonable concrete syntax: 

fnxy. (x $ y) $ VAR 3. 

Note that although de Bruijn indices do not appear in such terms, numbers can appear as arguments to 
Hybrid's VAR operator, which is included to allow a representation of free variables that is distinct from 
bound variables. 

We now turn to the properties required of expr and its operators to function as HOAS. We motivate 
the requirements by considering adequacy, an important meta-theoretic property. This can take several 
forms, but the proof presented in [12] uses bijectivity of a set-theoretic semantics on a A-calculus-Uke 
subset of the Isabelle/HOL terms of type expr, called the syntactic terms: 

s::=x\ CON a \ VARn | si $$S2 \ lAMx.s 

^While APP and LAM were inspired by the untyped A-calculus, in Hybrid they are used only as syntax, without built-in 
notions of jS -conversion, normal forms, etc. 
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where s (with possible subscripts) stands for a syntactic term, x for a variable of type expr, a for a constant 
of type con, and n for a natural-number constant. Note that >y is an informal mathematically defined set; 
it is not a formal Isabelle/HOL definition. 

However, open terms present a complication. Suppose we have a theory where the semantics is 
bijective on closed syntactic terms, which it maps to a set S. Then it will map open terms with n free 
variables to functions from the Cartesian power 5" to 5. But there are many such functions that do not 
correspond to syntactic terms; for example, the function S — > 5 corresponding to the Isabelle/HOL term 

A X. if (3 a. X = CON a) then (x $$ x) else x 

of type {expr =^ expr). Indeed, there are a countable infinity of syntactic terms, while the set of functions 
from 5" to S is uncountable for n > 1 . 

Thus, Hybrid must define a subset of the function space to be used as its representation for open syn- 
tactic terms. This is done using a predicate abstr :: {(expr =^ expr) =^ boot). The functions satisfying 
abstr will be those of the form (Ax. 5') where 5 is a syntactic term with (at most) one free variable x; we 
call these the syntactic functions} (Syntactic terms with more than one free variable can be handled one 
variable at a time.) 

In the first-order case, three properties hold of a type defined using Isabelle/HOL's datatype: dis- 
tinctness of the datatype constructors, injectivity of each constructor, and an induction principle. In the 
case of Hybrid, distinctness of all the operators and injectivity of the first-order operators (i.e., all except 
LAM) are straightforward to achieve, e.g.: 

V (c :: con) (S :: expr expr). CON c / LAM S 

V (s t s' t' :: expr). (s $$ t = s' $$ t') — ^ (s = s') A (t = t'). 

(These properties are used as rewrite rules for Isabelle's simplifier, to reduce equalities of Hybrid terms 
with known operators on both sides; typically this results in equalities where one side is just an Isabelle/ 
HOL variable, which can then be eliminated by substitution.^) 

Injectivity of LAM must be restricted to functions satisfying abstr; indeed, it can be proven in 
Isabelle/HOL that no injective function from {expr =^ expr) to expr exists, by formalizing Cantor's 
diagonal argument. As mentioned earlier, our improved version requires an abstr condition for only one 
side of the equality: 

I abstr S V abstr T; LAM S = LAMT] ^ S = T. 

Requiring only a single condition reduces the need for explicit abstr conditions in object-language en- 
codings, because they can be transported across equalities of LAM terms. It is achieved by adding to 
the type expr an additional constant ERR, and defining LAM to take the value ERR on functions not 
satisfying abstr. (The constant ERR will sometimes appear as an additional case alongside the operators 
of Hybrid, in lemmas that impose an abstr condition for the LAM case. We also include it among the 
syntactic terms.) 

Since abstr appears as a premise of injectivity — and it would in any case be needed to state proper- 
ties of open syntactic terms — we must also include properties sufficient to characterize it. While Hybrid 

^Previous work called such functions abstractions [1] - thus the predicate name abstr; and called functions not satisfying 
abstr exotic terms [1,5]. 

^Indeed, most use of Hybrid's lemmas in object-language work is automated using Isabelle's simplifier and classical rea- 
soner, and as a result, direct references to Hybrid's lemmas may be rare. 
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proves a number of lemmas regarding abstr for convenience and proof automation, the desired charac- 
terization can be given in a single statement: 





(Y = 


(A X. x)) 


V 


(3 a. Y = 


(A X. CON a)) 


V 


(3n. Y = 


(A X. VAR n)) 


V 


(3 S T. Y = 


(A X. S X $$ T x) A abstr S A abstr T) 


V 


(3 W. Y = 


(A X. LAM y. W X v) A abstr W) 


V 


(Y = 


(A X. ERR)) 



Once again the LAM case comphcates matters: the underlined occurrence of (abstr W) applies 
abstr to a function W :: {[expr, expr] =^ exprj . This should be possible by using type classes to give a 
polymorphic definition for abstr, but that is future work. The present version of Hybrid instead replaces 
(abstr W) with (V y. abstr (A x. W x y)) A (V x. abstr (A y. W x y)). 

As for induction, it can take several forms. First, a kind of size induction on expr is available, 
similar to size induction for types defined by Isabelle/HOL's datatype package. This induction has hmited 
applicability in the higher-order setting, although it was used in the proof of adequacy [12]. We also retain 
an induction principle from the original version of Hybrid [1] where the first-order induction cases are 
standard, while the LAM case is: 

V S :: {expr => expr). abstr S A (V n. P (S (VAR n))) — > P (LAM x. S x). 

A common form of induction used in many case studies involves some form of structural induction 
on the encoding of the inference rules of an OL. For this kind of reasoning, a two-level approach is 
adopted, similar in spirit to other systems such as Twelf [18] and Abella [10]. An intermediate layer 
between the meta-logic (Isabelle/HOL) and the OL, called a specification logic, is defined inductively 
in Isabelle/HOL. This middle layer allows succinct and direct encodings of object logic inference rules, 
which are also defined as inductive definitions. Successful applications of this kind of induction can be 
found in [8, 12], for example. 

Finally, Hybrid aims to build expr and its operators definitionally in Isabelle/HOL. While the de- 
scription above is an informal but reasonably complete specification of Hybrid, it is not directly usable 
as a definition because it is circular: the arguments of LAM and abstr may themselves contain LAM, 
and injectivity of LAM depends on abstr. It could be formalized as an axiomatic theory, leaving consis- 
tency as a meta-theoretical problem; but instead, Hybrid is built definitionally in terms of n first-order 
representation of variable binding based on de Bruijn indices. The definitions and lemmas involved in 
achieving this are the subject of the next sections. 

3 De Bruijn syntax 

The Hybrid theory defines the type expr in terms of an Isabelle/HOL datatype dB, which represents 
abstract syntax using a nameless first-order representation of bound variables called de Bruijn indices [2] . 

This approach differs from the original version of Hybrid [1], which used a datatype corresponding to 
our dB directly as expr; the significance of this difference will be explained in Sections 4 and 6. However, 
the datatype itself is very similar, and this section follows [1] closely. 

Definition 1 
types 

var = nat 
bnd = nat 
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datatype a dB = 

CON' a I VAR' var \ APR' {a dB) {a dB) {notation (s $$' t)) 
I ERR' I BND' bnd \ ABS' (adB) 

The constructors CON', VAR', and APR' correspond to the operators CON, VAR, and APR on type 
expr, which were discussed in Sect. 2 and wiU be defined later. The one significant difference is that the 
argument of CON' is a type parameter a, rather than a particular type con. This will actually be true for 
CON as well, and it allows Hybrid to be defined as an OL-independent Isabelle/HOL theory, and later 
used with OL-specific constants. (We will frequently omit this type parameter, except where it occurs in 
formal definitions or it is instantiated.) 

The other three constructors (ERR', BND', and ABS') will all be used in the definition of LAM. The 
constant ERR' will be a placeholder for LAM applied to a non-syntactic function; it was not present 
in [1], and its significance will be explained later. The constructor ABS' functions as a nameless binder, 
while (BND' i) represents the variable implicitly bound by the (i + 1)'*' enclosing ABS' node. If there 
are not enough ABS' nodes, then it is called a dangling index. 

As an example, consider the term 

ABS' (ABS' (BND' 2 $$' BND' 1 $$' BND' 0) $$' BND' ). 

The underlined occurrences of (BND' 1) and (BND' 0) both refer to the variable bound by the outer 
ABS' (also underlined), while the other occurrence of (BND' 0) refers to the variable bound by the inner 
ABS'. (BND' 2) is a dangling index, because there are only 2 enclosing ABS' nodes. 

To keep track of dangling indices, we define a predicate level : : [ bnd, dB ] =^ tool such that (level i t) 
is true if enclosing the term t in i or more ABS' nodes would result in a term without dangling indices. 
(We omit the formal definition, which is straightforward.) A term with no dangling indices is called 
proper, and we may define an abbreviation (proper t) = (level t). These notions are standard for 
abstract syntax based on de Bruijn indices [1]. 

4 The type "expr" of proper de Bruijn terms 

Defining a type designed specifically to represent syntax has been used in a variety of approaches to 
reasoning about the A-calculus and other object logics (e.g. [17, 22]). Here, we use Isabelle/HOL's 
typedef mechanism to define expr as a bijective image of the set of proper terms of type dB."^ That 
eUminates the proper conditions in object-language work using Hybrid, at the expense of having to 
convert terms between expr and dB in defining LAM and abstr. This is a good trade-off, because those 
definitions are internal to Hybrid and need only be made once. It also turns out to be essential for 
strengthening the quasi-injectivity property of LAM, as described in Sect. 6. 

Definition 2 

typedef (open) a expr = {x :: a dB. level x} morphisms dB expr 

This typedef statement first demands a proof that the specified set is nonempty (which is trivial here). 
Then it introduces the type expr, the functions dB :: [expr =^ dB) and expr :: {dB =^ expr), and axioms 
stating that they are inverse bijections between the type expr and the set {x :: dB. level x}. (Although 
axioms are used, the overall mechanism is a form of definitional extension and preserves consistency of 
the theory.) 



*The version of expr presented here is a modification of tiie one used in [14], 



82 



An Improved Implementation and Abstract Interface for Hybrid 



We may now define all of the first-order operators of Hybrid (i.e., all except LAM, with its functional- 
type argument) in the obvious way. 

Definition 3 

CON ::a^ a expr CON a = expr (CON' a) 

VAR :: var ^ a expr VAR n = expr (VAR' n) 

APP :: [a expr, a expr] a expr s $$ t = expr (dB s $$' dB t) 

(notation (s $$ t)) 
ERR :: a expr ERR = expr ERR' 

ERR is defined as if it were a separate operator, and it will sometimes be treated as such, but it will 
also be generated by LAM applied to a non-syntactic function. 

The functions d B and expr translate these operators to the corresponding constructors of dB (Defi- 
nition 1) and vice versa. This is formalized by a set of lemmas that follow straightforwardly from the 
definitions, of which we present just those for APP ($$) as an example. 

Lemma 4 

dB (s $$ t) = dBs $$' dBt 

I level s; level 1] => expr (s $$' t) = expr s $$ expr t 

Distinctness and injectivity for these operators follow from the corresponding properties of dB. In 
Sect. 6, we will extend these results to LAM as well. 

The (level 0) premises in the lemma above are needed because the typedef-generated function expr is 
undefined on terms with dangling indices. These premises could be eliminated by defining a more tightly- 
specified version of expr, satisfying the same typedef-generated axioms while preserving the structure 
of its argument except for any dangling indices. This was done in the previous version of Hybrid [14] 
(with the help of an auxiliary function called trim). However, with a more systematic treatment of level 
and some additional lemmas for it, this was found to be unnecessary. 

All versions of Hybrid follow a general pattern of making definitions and proving lemmas first for 
arbitrary levels, and then deriving the desired results for proper terms as corollaries. In the present 
version, arbitrary levels are handled by recursion and induction over de Bruijn syntax, using the type dB 
and the predicate level, while the results for proper terms are stated at type expr. 



5 Definition of "abstr" and "LAM" 

We now turn to the task of defining abstr and LAM. The main ideas are from [1], but the details of the 
definitions and proofs are original. There are some improvements over the original version of Hybrid, 
which will be described in this section and Sect. 6. 

Since we will be defining abstr and LAM in terms of de Bruijn syntax, the definition of syntactic 
functions from Sect. 2 is not directly usable here: we need an analogous definition using de Bruijn 
syntax in place of LAM. 

For recursion, we must work with dB-valued functions (arbitrary levels) rather than expr-valued 
functions. However, the argument type need not also be dB, and in fact it will be more convenient to 
work with functions of type (expr dB). This simplifies the treatment of level by avoiding negative 
occurrences of the type dB. 

Thus we define the syntactic dB-terms, as a subset of Isabelle/HOL terms of type dB, using variables 
of type expr converted via dB: 

j::=dBx|CON'a| VAR'n|ji$$'j2| ERR'I BND' j| ABS'j 
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where .v (with possible subscripts) stands for a syntactic dB-term, x for a variable of type expr, a for a 
constant of type con, and n and / for natural-number constants. We define the syntactic dB-functions as 
the functions of type {expr dB) of the form [Xx.s), where 5 is a syntactic dS-term with (at most) one 
free variable x. Such functions mix de Bruijn indices (BND') with HOAS (using the Isabelle/HOL bound 
variable x to represent an object-language variable). 

We define a predicate Abstr to recognize the syntactic t/fi-functions, which formally defines the so- 
far only informally identified set. We also define an auxiliary predicate ordinary needed in the definition 
of Abstr: 

Definition 5 

ordinary :: {b ^ a dB) =^ bool 
ordinary X = (3 a. X = (A x. CON' a)) V (3 n. X = (A x. VAR' n)) V 
(3 S T. X = (A X. S X $$' T x)) V (X = (A x. ERR')) V 
(3 j. X = (A X. BND' j)) V (3 S. X = (A x. ABS' (S x))) 

Definition 6 
function Abstr :: (a expr ^ a dB) ^ bool 
Abstr (A X. s) = True where s is (CON' a), (VAR' n), ERR', or (BND' i) 
Abstr (A X. S X $$' T x) = (Abstr S A Abstr T) 
Abstr (A x. ABS' (S x)) = Abstr S 
^ ordinary S =^ Abstr S = (S = dB) 

Syntactically, the defining equations for Abstr have the form of recursion on the body of a A-abstraction. 
Mathematically, they define (Abstr S) by recursion on the common structure of all the values of the func- 
tion S, i.e., on the common structure (if any) of (S x) for all x :: expr. The predicate ordinary recognizes 
those functions that match one of the first three equations, so that the condition (-■ ordinary S) on the 
last equation may be read as "otherwise"; that equation corresponds to the variable case for syntactic 
dB-terms as defined above. 

This definition is formalized with the help of Isabelle/HOL's function command. It demands proofs 
of pattern completeness, compatibihty, and termination (not shown), and then in addition to defin- 
ing Abstr and proving its defining equations, it automatically generates structural induction and case- 
distinction rules for the type {expr dB) corresponding to the pattern of recursion used in the definition; 
these are called Abstr . induct and Abstr . cases respectively, and will be referred to later. 

We may now define the predicate abstr in terms of Abstr by using post- 
composition with dB to convert its function argument from the type {expr expr) to {expr dB). 

Definition 7 

abstr :: {a expr => a expr) bool 
abstr S = Abstr (dB o S) 

Note that unlike the situation in [1], the definition of Abstr does not need to impose a constraint on 
the argument of BND', because in the case of (abstr S) dangling indices are excluded by the type of the 
function S :: {expr =^ expr). 

Lemma 8 

Abstr_ const: Abstr (A x. s) 

The lemma Abstr_const shows that any constant function of type {expr =^ dB) satisfies Abstr. It is 
used to prove a similar property for abstr, and will later be used directly as well. It is proved by induction 
on s using Definition 6 (Abstr). 
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Lemma 9 

abstr_id: abstr (A x. x) 
abstr_const: abstr (A x. s) 

abstr_APP: abstr (A x. S x $$ T x) = (abstr S A abstr T) 

The lemma abstr_const is a corollary of Abstr_const, while the other two lemmas are proved directly, 
using Definitions 7 (abstr) and 6 (Abstr). 

These lemmas allow abstr conditions for syntactic functions to be proved compositionally without 
unfolding the definition, except when the body of the function contains a LAM subterm that involves 
the function argument (so that it is not just a constant). In that case, previous versions of Hybrid re- 
quired unfolding the definitions of abstr and LAM to convert HOAS to de Bruijn syntax. The present 
work improves on that situation by providing a compositional rule also for the LAM case (Lemma 20 in 
Sect. 7). 

The lemma abstr_const will be important for Hybrid terms with nested LAM operators, to show that 
the argument of an inner LAM satisfies abstr when its body contains a bound variable from an outer 
LAM; such a bound variable is a placeholder for an arbitrary term of type expr, which is exactly the role 

of s in abstr_const. 

We now define the function LAM, using the same form of recursion that was used in the definition of 
abstr. 

Definition 10 

LAM :: (a expr =^ a expr) =^ a expr 

LAM S = expr (Lambda (dB o S)) 
Lambda :: {a expr =^ a dB) =^ a dB 

Lambda S = if (Abstr S) then (ABS' (Lbind S)) else ERR' 

The function LAM, like abstr, first composes dB with the given function. It then applies the auxiliary 
function Lambda and converts the resulting term from type dB to type expr. 

The function Lambda first checks if its argument satisfies Abstr, and produces ERR' if not. (This is 
equivalent to checking if the argument of LAM satisfies abstr.) The original version of Hybrid [1] did 
not do this check (and did not have the constant ERR'), making it impossible to determine from (LAM S) 
whether S was a syntactic function or not. We include these features to support the stronger injectivity 
property for LAM proved in Sect. 6. 

If its argument does satisfy Abstr, then Lambda applies another auxiliary function Lbind, defined by 
recursion, to convert HOAS to de Bruijn syntax; i.e., to convert the variable represented by the function 
argument into a dangling de Bruijn index. It then applies a new ABS' node to bind the variable and obtain 
a proper de Bruijn term. 

Definition 11 
function Lbind :: [bnd, {a expr =^ a dB) ] ^ a dB 
Lbind i{Xx.s)=s where s is (CON' a), (VAR' n), ERR', or (BND' j) 
Lbind i (A x. S x $$' T x) = Lbind i S $$' Lbind i T 
Lbind i (A x. ABS' (S x)) = ABS' (Lbind (i + 1) S) 
^ ordinary S =^ Lbind i S = BND' i 

The auxiliary function Lbind extracts the common structure of the values of its function argument, 
replacing indecomposable uses of the bound variable (i.e., functions that do not match any of the first 
three equations) with (BND' i). This is a dangling de Bruijn index, and i is incremented each time the 
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recursion passes an ABS' node so that all such instances of BND' will refer to the ABS' node added by 
Lambda. The Abstr condition checked in the definition of Lambda ensures that the last equation will be 
applied only when S = (Ax. dB x). 

Lemma 12 

Lbind_const: Lbind i (A x. s) = s 

The lemma Lbind_const shows that applying (Lbind i) to a constant function of type {expr =^ dB) 
gives the constant value of that function. It is proved by induction on s. This lemma will be important for 
Hybrid terms with nested LAM operators, to allow the argument of an outer LAM to satisfy abstr when 
its bound variable occurs in the scope of an inner LAM. 

Lemma 13 

dB.LAM: dB (LAM S) = if (abstr S) then (ABS' (Lbind (dB o S))) else ERR' 
abstr_dB_LAM: abstr S ^ dB (LAM S) = ABS' (Lbind (dB o S)) 

The lemma dB_LAM combines unfolding of Definition 10 (LAM and Lambda) with cancellation 
of the functions dB and expr, using the fact that both ERR' and (ABS' (Lbind (dB o S))) are proper. 
(Dangling indices are excluded from S :: {expr =^ expr) by its type, and the one introduced by Lbind is 
bound by the enclosing ABS'.) The lemma abstr_dB_LAM is a weaker version intended as a conditional 
rewrite rule for Isabelle's simplifier, to do the unfolding only if the abstr condition simplifies to True. 

With the definitions above, Hybrid terms using LAM (i.e., closed syntactic terms) are provably equal 
to the corresponding de Bruijn syntax representations, converted to the type expr using the function expr. 
(This is much the same situation as in [1], except for the type conversion which was not necessary there.) 
Thus, starting from two distinct representations for free variables, we have established two ambiguous 
representations for bound variables, in the sense that any given element of expr may be viewed as having 
either form. In the following sections, we will state results using the HOAS representation (LAM) but 
use the de Bruijn syntax representation (ABS'/BND') in proofs by induction, aiming to characterize the 
former representation so that it stands on its own. 

All versions of Hybrid have used essentially the same form of recursion to define abstr and LAM, 
and the corresponding form of induction to prove their properties. However, the means of formalizing it 
have varied greatly. The original version [1] used inductively-defined predicates and induction on those 
predicates; the following version [14] used primitive recursion and induction on an auxiliary datatype 
dBJh; while the present version avoids many of the complications of the previous approaches with the 
help of the function command. 

A predicate called ordinary has also been present in all versions of Hybrid, though it originally 
included the variable case as well. Removing this case allowed ordinary to be generalized to dB- valued 
functions on any type; this will allow us to reuse it for binary functions in Sect. 7. (It is also reused for 
n-dsy functions in [12, Sect. 3.3].) 

6 Injectivity of "LAM" 

As stated in Sect. 2, Hybrid proves injectivity of LAM restricted to functions of type {expr =^ expr) 
satisfying abstr. Improving on [1], this property is strengthened by requiring only one abstr premise, 
using the fact that LAM maps functions not satisfying abstr to a recognizable placeholder term ERR. 

We begin with an injectivity result for arbitrary de Bruijn levels. To state this result concisely, we 
first define an abbreviation Level for pointwise application of level to a function: 
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Definition 14 
abbreviation Level :: [bnd, {b =^ adB)] =^ bool 
Level i S = V X. level i (S x) 

Lemma 15 

Abstr_Lbind_inject : 

I Abstr S; Abstr T; Level i S; Level i T] => (Lbind i S = Lbind i T) = (S = T) 

This lemma is proved by a straightforward induction on S :: {expr dB) using Abstr. induct (from 
Definition 6). 

Theorem 16 (Injectivity of LAM) 

I LAM S = LAM T; abstr S V abstr T] ^ S = T 

Proof. If one of S and T satisfies abstr and the other does not, then by Lemma 13 (dB_LAM), one of 
the terms (dB (LAM S)) and (dB (LAM T)) is of the form (ABS' t) for some t :: dB, while the other 
is ERR'. But these terms cannot be equal, which contradicts the premise LAM S = LAM T. Thus the 
original assumption must be false, and we must have both (abstr S) and (abstr T). 

We apply dB to both sides of the equality LAM S = LAM T and simplify using abstr_dB_LAM 
(Lemma 13) to obtain 

ABS' (Lbind (dB o S)) = ABS' (Lbind (dB o T)). 

ABS' is a datatype constructor and thus injective, so we may cancel it: 

Lbind (dB o S) = Lbind (dB o T). 

We have (Abstr (dB o S)) and (Abstr (dB o T)) by unfolding Definition 7 (abstr), and we also have 
(Level (dB o S)) and (Level (dB o T)) since terms converted from type expr are proper by Defi- 
nition 2. Thus we may apply the preceding lemma (Abstr Lbind inject) to deduce dBoS = dBoT. 
Since dB is injective, it can be canceled to obtain S = T, as was to be proven. □ 

Note that (Lbind 0) is only injective on functions from expr to dB whose values are proper terms, 
i.e., those that factor through dB, because any pre-existing dangling indices at level 1 would be indistin- 
guishable from those resulting from conversion of the HOAS variable. For example, 

Lbind (A x. dB x) = BND' = Lbind (A x. BND' 0). 

Thus, without the typedef limiting expr to proper terms, we would not be able to avoid conditions on 
both S and T; at best, we could replace one abstr condition with something like (V x. proper x — > 
proper (T x)). 

The advantage of an injectivity property that can work with a condition on only one of S and T is 
that it simplifies the elimination rules for inductively-defined predicates on Hybrid terms, such as the 
formalization of evaluation for Mini-ML with references in [12, Sect. 5.3]. As a result, abstr conditions 
are more often available where they are needed, without having to add them as premises. 

Distinctness of LAM from the first-order operators of Definition 3 follows straightforwardly from 
Definition 10, except that (LAM F) is distinct from ERR only under the premise (abstr F). 
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7 Characterizing "abstr" 

In Sect. 5, an incomplete set of simplification rules for abstr was provided as Lemma 9. The missing 
case is (abstr {X x. LAM y. W x y)). 

Both previous versions of Hybrid [1, 14] relied on conversion from HOAS to de Bruijn syntax to 
handle this case. That is sufficient for proving that particular syntactic functions satisfy abstr, but it is 
less useful for partially- specified functions as found in inductive proofs. 

We could obtain a compositional introduction rule for this case by defining a predicate bi Abstr :: 
{[expr, expr] ^ expr) ^ ^joo/ generaUzing abstr, and proving 

biAbstr W =^ abstr (A x. LAM y. W x y). 

This was done by Momigliano et al. [13]; their formal theory BiAbstr is available online [6]. However, 
the LAM case arises again for biAbstr, and for any higher-arity generalization. There are several ways to 
address this: 

• Use Isabelle/HOL's axiomatic type classes to define a polymorphic predicate generalizing abstr to 
curried functions of arbitrary arity. This looks Uke a promising approach, but it remains as future 
work. 

• Find a single type that can represent functions of arbitrary arity, and generalize Hybrid's constructs 
to that type. (Some experimental work has been done in that direction [12, Sect. 3.3].) Such a type 
is also useful as a representation of open terms for induction. 

• Prove a result that reduces biAbstr to abstr. This seems to be the most direct solution, and it is the 
approach we take in the present work. 

In this section, we will represent functions of two arguments using pairs, rather than in the usual 
curried form, so that we may reuse Definition 5 (ordinary) and some technical lemmas (left unstated as 
they are mathematically trivial), all of which refer to the polymorphic type [b dB). 

Definition 17 

abstr_2 :: (a expr x a expr =^ a expr) =^ bool 
abstr_2 S = Abstr_2 (dB o S) 

The predicate abstr_2 generalizes abstr to functions on the Cartesian product type {expr x expr); it 
corresponds to biAbstr [13]. It is defined in the same way as abstr, composing dB with its argument and 
then applying a recursively-defined auxiliary predicate Abstr_2. 

Definition 18 

function Abstr_2 :: (a expr x a expr =^ a dB) =^ bool 
Abstr_2 (A p. s) = True where s is (CON' a), (VAR' n), ERR', or (BND' i) 
Abstr_2 (A p. S p $$ T p) = (Abstr_2 S A Abstr_2 T) 
Abstr_2 (A p. ABS' (S p)) = Abstr_2 S 

^ ordinary S =^ Abstr 2 S = (S = dB o fst V S = dB o snd) 

The predicate Abstr_2 is similar to Abstr, except that it has two variable cases: (dB o fst) and 
(dB o snd), or equivalently, (A (x, y). dB x) and (A (x, y). dB y). 

Lemma 19 

abstr_2 S = ((Vy. abstr (A x. S (x, y))) A (Vx. abstr (A y. S (x, y)))) 
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This lemma shows that if a two-argument function satisfies abstr in each argument for any fixed 
value of the other argument, then it satisfies abstr_2. (And the converse, which is easier.) We omit the 
formal proof, but note that it is fairly long and requires several lemmas. 

Having thus reduced abstr_2 to componentwise abstr, we may now derive the desired simpUfication 
rule for the case (abstr (A x. LAM y. W x y)). 

Lemma 20 

abstr_LAM: V x. abstr (A y. W x y) =^ 

abstr (A x. LAM y. W x y) = (V y. abstr (A x. W x y)) 

This lemma provides a compositional rule for proving abstr conditions on functions of the form 
(Ax. LAMy. Wxy), via the reverse direction of the biconditional. Both directions are also used in the 
proof of adequacy. It was proved with the help of (a variant of) Lemma 19. 

We consider a small example, the term (LAM x. LAM y. x $$ y), illustrating abstr_LAM by proving 
that the argument of the outer LAM satisfies abstr, without the use of de Bruijn syntax: 

V X. abstr (A y. x) (by abstr_const) 

abstr (A y. y) (by abstr_id) 

V x. abstr (A y. (x $$ y)) (by abstr. APP) 

abstr (A x. x) (by abstr_id) 

V y. abstr (A x. y) (by abstr_const) 

V y. abstr (A x. (x $$ y)) (by abstr.APP) 

abstr (A x. LAM y. (x $$ y)) (by abstr.LAM) 

Not only does the lemma abstr_LAM allow abstr statements to be proved without the use of de Bruijn 
syntax, but it also completes the task of characterizing expr on its own terms - that is, without reference to 
the underlying de Bruijn syntax. This is demonstrated in [12] by the fact that representational adequacy 
follows from Hybrid's lemmas concerning the type expr, and it is a significant improvement over both 
previous versions of Hybrid [1, 14]. 

We also obtain the characterization of abstr stated in Sect. 2 as a corollary of abstr_LAM: 

Lemma 21 

abstr Y = ((Y = (A x. x)) V 

(3 a. Y = (A X. CON a)) V (3 n. Y = (A x. VAR n)) V 
(3 S T. abstr S A abstr T A Y = (A x. S x $$ T x)) V 
(3 W. (V X. abstr (A y. W x y)) A (V y. abstr (A x. W x y)) A 
Y = (A x. LAM y. Wxy)) V (Y = (A x. ERR))) 



8 Conclusion 

Hybrid is the first approach to formalizing variable-binding constructs that is both based on full HOAS 
and is built definitionally in a general-purpose proof assistant (Isabelle/HOL). More recently, Popescu et. 
al. have developed an approach motivated by a new proof of strong normalization for System F that takes 
advantage of HOAS techniques [21]. It is also definitional, implements full HOAS, and is implemented 
in Isabelle/HOL, though the details of the formalizations as well as the case studies carried out in each 
system are quite different. A more in-depth comparison is the subject of future work. 

There are many other related approaches, and we mention only a few here. See [8, 12] for a fuller 
discussion. Systems that implement logics designed specifically for reasoning using HOAS include 
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Twelf [18] (one of the most mature systems in this category), Abella [10], and Beluga [19]. These 
systems have the advantage of being purpose-built for reasoning about formal systems, but this can also 
be a disadvantage in that they cannot exploit the extensive libraries of formalized mathematics available 
for proof assistants such as Isabelle/HOL. For a comparison of Hybrid to Twelf and Beluga, see [7]. The 
nominal datatype package [22] implements a different approach which seeks to formalize equivalence of 
classes of terms up to renaming of bound variables, and also the Barendregt variable convention, using 
concepts from nominal logic [9,20]. 

There are several versions of Hybrid based on the Coq proof assistant. One such version [8] closely 
follows the structure of the Isabelle/HOL version; another implements a constructive variant of Hybrid 
for Coq [3] that aims to leverage the use of dependent types to simplify and provide new ways to specify 
OLs. There have also been a number of applications and case studies for Hybrid, the largest being the 
comparison of five formalizations of subject reduction for Mini-ML with references [12], which uses the 
improved Hybrid described in this paper. Future work includes porting other applications to use the new 
Hybrid. This will be straightforward since they are simpler and will be further simplified by the new 
interface. Future work also includes carrying out new case studies to further illustrate the benefits of the 
new Hybrid. 

Although we have significantly improved Hybrid, there is always room for further improvement. For 
example, the induction principle discussed at the end of Sect. 2 (the one whose LAM case is displayed) 
falls back to named (or numbered) variables for inductive proofs, which means giving up some of the 
advantages of HOAS. We are working on a more general approach to induction that preserves the HOAS 
feature of substitution by function application. In fact, we have proved an induction principle for a type 
that represents «-ary functions on the type expr [12], which we hope will serve as the basis for general 
induction principles for HOAS in Hybrid. Its integration into Hybrid remains as future work. As another 
example, we mentioned that Hybrid is untyped, requiring predicates to be introduced to distinguish 
different kinds of OL terms encoded into expr. On one hand, these well-formedness predicates can 
provide a convenient form of induction within the context of the two-level approach; on the other hand 
this is a potential area for improvement. Some work in this direction has been done in the Coq version 
of Hybrid [4]. 
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